首页> 外文OA文献 >Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits

【2h】

Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits

机译：多臂主动学习的上置信度约束算法草寇

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we study the problem of estimating uniformly well the meanvalues of several distributions given a finite budget of samples. If thevariance of the distributions were known, one could design an optimal samplingstrategy by collecting a number of independent samples per distribution that isproportional to their variance. However, in the more realistic case where thedistributions are not known in advance, one needs to design adaptive samplingstrategies in order to select which distribution to sample from according tothe previously observed samples. We describe two strategies based on pullingthe distributions a number of times that is proportional to a high-probabilityupper-confidence-bound on their variance (built from previous observed samples)and report a finite-sample performance analysis on the excess estimation errorcompared to the optimal allocation. We show that the performance of theseallocation strategies depends not only on the variances but also on the fullshape of the distributions.

机译：在本文中，我们研究在给定有限样本预算的情况下，均匀好地估计几种分布的均值的问题。如果知道分布的方差，则可以通过为每个分布收集与方差成比例的独立样本，来设计最佳的抽样策略。然而，在更实际的情况下，事先不知道分布，人们需要设计自适应采样策略，以便根据先前观察到的样本从中选择要采样的分布。我们描述了两种基于拉高分布概率的策略，这些概率与高概率概率的方差上限成正比（由先前观察到的样本构建），并针对与最优方法相比的超额估计误差报告了有限样本性能分析分配。我们表明，这些分配策略的性能不仅取决于方差，还取决于分布的完整形状。

著录项

作者
Carpentier, Alexandra; Lazaric, Alessandro; Ghavamzadeh, Mohammad; Munos, Rémi; Auer, Peter; Antos, András;
展开▼
作者单位

展开▼
年度 2015
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Rethinking the Gold Standard With Multi-armed Bandits: Machine Learning Allocation Algorithms for Experiments [J] . Kaibel Chris, Biemann Torsten Organizational Research Methods . 2021,第1期

机译：用多武装燃烧的金标：实验的机器学习分配算法
2. Foraging decisions as multi-armed bandit problems: Applying reinforcement learning algorithms to foraging data [J] . Morimoto Juliano Journal of Theoretical Biology . 2019,第期

机译：觅食决策作为多武装强盗问题：应用强化学习算法觅食数据
3. Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems [J] . Koulouriotis DE, Xanthopoulos A Applied mathematics and computation . 2008,第2期

机译：非平稳多臂土匪问题的强化学习和进化算法
4. Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits [C] . Alexandra Carpentier, Alessandro Lazaric, Mohammad Ghavamzadeh, Algorithmic learning theory . 2011

机译：多臂土匪主动学习的高置信度界算法
5. Offline Evaluation of Multi-Armed Bandit Algorithms Using Bootstrapped Replay on Expanded Data [D] . Dai, Jin. 2021

机译：在扩展数据上使用引导重播的多武装强盗算法的离线评估
6. Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm [O] . Emanuele Cavenaghi, Gabriele Sottocornola, Fabio Stella, 2021

机译：非固定多武装强盗：新概念漂移感知算法的实证评估
7. Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits [O] . Ra Carpentier, Ro Lazaric, Mohammad Ghavamzadeh, 2012

机译：多臂土匪主动学习的高置信度界算法
8. Learning in A Changing World: Non-Bayesian Restless Multi-Armed Bandit [R] . Liu, H., Liu, K., Zhao, Q. 2010

机译：在变化的世界中学习：非贝叶斯不安定的多武装强盗

Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits

摘要

著录项

相似文献

相关主题

期刊订阅